The Role of Biomedical Dataset in Classification
نویسندگان
چکیده
In this paper, we investigate the role of a biomedical dataset on the classification accuracy of an algorithm. We quantify the complexity of a biomedical dataset using five complexity measures: correlation-based feature selection subset merit, noise, imbalance ratio, missing values and information gain. The effect of these complexity measures on classification accuracy is evaluated using five diverse machine learning algorithms: J48 (decision tree), SMO (support vector machines), Naive Bayes (probabilistic), IBk (instance based learner) and JRIP (rule-based induction). The results of our experiments show that noise and correlation-based feature selection subset merit – not a particular choice of algorithm – play a major role in determining the classification accuracy. In the end, we provide researchers with a meta-model and an empirical equation to estimate the classification potential of a dataset on the basis of its complexity. This well help researchers to efficiently pre-process the dataset for automatic knowledge extraction.
منابع مشابه
A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملDiagnosis of Diabetes Using an Intelligent Approach Based on Bi-Level Dimensionality Reduction and Classification Algorithms
Objective: Diabetes is one of the most common metabolic diseases. Earlier diagnosis of diabetes and treatment of hyperglycemia and related metabolic abnormalities is of vital importance. Diagnosis of diabetes via proper interpretation of the diabetes data is an important classification problem. Classification systems help the clinicians to predict the risk factors that cause the diabetes or pre...
متن کاملClassification of Chronic Kidney Disease Patients via k-important Neighbors in High Dimensional Metabolomics Dataset
Background: Chronic kidney disease (CKD), characterized by progressive loss of renal function, is becoming a growing problem in the general population. New analytical technologies such as “omics”-based approaches, including metabolomics, provide a useful platform for biomarker discovery and improvement of CKD management. In metabolomics studies, not only prediction accuracy is ...
متن کاملConditional Random Fields for Airborne Lidar Point Cloud Classification in Urban Area
Over the past decades, urban growth has been known as a worldwide phenomenon that includes widening process and expanding pattern. While the cities are changing rapidly, their quantitative analysis as well as decision making in urban planning can benefit from two-dimensional (2D) and three-dimensional (3D) digital models. The recent developments in imaging and non-imaging sensor technologies, s...
متن کاملA Modified Grey Wolf Optimizer by Individual Best Memory and Penalty Factor for Sonar and Radar Dataset Classification
Meta-heuristic Algorithms (MA) are widely accepted as excellent ways to solve a variety of optimization problems in recent decades. Grey Wolf Optimization (GWO) is a novel Meta-heuristic Algorithm (MA) that has been generated a great deal of research interest due to its advantages such as simple implementation and powerful exploitation. This study proposes a novel GWO-based MA and two extra fea...
متن کامل